BMC Medical Genomics — Latest Matching Preprints

1

Translational bioinformatics and machine learning framework for biomarker discovery, disease prediction, and patient profiling for precision medicine

Ahmed, Z.; Govindareddy, P.; DeGroat, W.; Narayanan, R.; Peker, E.; Zeeshan, S.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353961 medRxiv

Top 0.1%

12.3%

Show abstract

Precision medicine aims to advance our ability from a "one-size-fits-all" approach to personalized and predictive healthcare across diverse populations. It promotes integration of multi-omics and phenotypic data to understand disease mechanisms and discover novel biomarkers and risk factors, which could be used to predict and prevent critical diseases in individual patients across diverse populations. The potential implications of precision medicine approach can accelerate our ability to classify patients at higher risk of developing critical diseases, improve diagnostic capabilities, develop deeper understanding of individual risk, investigate racial differences and demographic characteristics, and find relationships between genetic variants, expressions, and diseases. This study focuses on implementing an innovative and data driven framework of translational bioinformatics and Machine Learning (ML) techniques to analyze multi-omics, including RNA-seq and Whole-Genome Sequencing (WGS) data, generated using blood samples of randomly consented patients. First, we utilized bioinformatics pipelines to identify differentially expressed genes and their pathogenic and likely pathogenic variants for the downstream data analysis, annotation, and visualization. Then, applied a nexus of ML models for multi-omics biomarker discovery, disease prediction, density-based clustering, single-patient profiling, and pathogenicity classification. WGS data analysis supported the exploration of genetic variation and diversity among patients to identify known and novel biomarkers, whereas RNA-seq data analysis improved our understanding of functional and biological pathways that underlying disease states. We classified and clustered pathogenic variants and expressions across various genes and discovered numerous diseases leading risk factors. Our results include gene-disease associations and captured common pathways across the broader population, demonstrating a level of sensitivity and accuracy that has broad clinical implications. We validated our results through clinical records, and state of the science literature. This study delves into the strengths of multi-omics data integration and capabilities of ML application in genetically diverse and complex patient cohorts. Our approach has the potential to elucidate complex gene-disease interactions for genetically diverse populations, which can support earlier diagnoses for patients in many disease realms.

2

Integration of single-cell and bulk RNA sequencing reveals TREM1 as a promising biomarker and therapeutic target for gouty arthritis

Jinfeng, W.; Jiarui, Z.; Hongbin, Q.

2026-05-20 public and global health 10.64898/2026.05.15.26353351 medRxiv

Top 0.1%

7.0%

Show abstract

Abstract: Objective This study aimed to systematically screen for potential candidate biomarkers and identify therapeutic targets associated with gouty arthritis (GA) through integrated analyses of single-cell and bulk RNA sequencing (RNA-seq) data. Methods The single-cell dataset GSE211783 and the bulk RNA-seq dataset GSE160170 were analyzed using a series of bioinformatic approaches, including cell clustering, differential expression analysis, immune cell infiltration assessment, protein-protein interaction network construction, gene set enrichment analysis, as well as drug sensitivity evaluation. To establish an animal model of GA, monosodium urate crystals were injected intra-articularly into experimental mice. Joint swelling was evaluated, and morphological changes in joint tissues were analyzed through hematoxylin-eosin staining. The presence of TREM1-positive cells was detected by immunohistochemistry and the level of TREM1 protein expression in joint tissues were assessed by Western blotting. Results We identified 102 differentially expressed genes (DEGs) and 14 signaling pathways associated with GA. The PPI network revealed 25 hub genes, of which 17 (including TREM1, TNF, PTGS2, and NLRP3) were highly expressed and 8 (including FCGR3B and CXCR6) showed low expression in the GA samples. These genes correlated significantly with the infiltration levels of macrophages. Among the hub genes, TREM1 was selected for further validation because it correlated significantly with all 14 differential pathways. In animal experiments, GA mice developed marked joint swelling and inflammatory tissue injury, along with a significant increase in TREM1-positive cells and TREM1 protein expression. Conclusion Integrative analysis of single-cell and bulk RNA-seq data identified 102 GA-related DEGs and 14 key pathways, from which 25 hub genes were screened. TREM1 is significantly upregulated in GA and may be linked to macrophage function, providing new insights into biomarker and therapeutic target discovery for GA.

3

Identification of drug candidates for rescue of SOX17 gene targets in pulmonary arterial hypertension

Vasilaki, E.; Akosman, B.; Song, S.; Walters, R.; Sharma, Y.; Pereira, M.; Keles, M.; Mykytyuk, N.; Maude, H.; Singh, N.; Field, G.; Ventetuolo, C. E.; Howard, L.; Aman, J.; Wilkins, M. R.; Klinger, J. R.; Zhao, L.; Cebola, I.; Liang, O.; Rhodes, C. J.

2026-05-21 pharmacology and toxicology 10.64898/2026.05.14.725284 medRxiv

Top 0.1%

6.2%

Show abstract

BackgroundBoth rare and common variants in the SRY-Box Transcription Factor 17 (SOX17) locus are associated with pulmonary arterial hypertension (PAH). SOX17 dysregulation leads to pulmonary artery endothelial cell (PAEC) dysfunction and the obstructive remodelling that characterises PAH. HypothesisImpaired SOX17 expression contributes to the pathogenesis of PAH. Restoring the function of SOX17 or its downstream targets using compounds that mimic its transcriptomic signature will rescue PAEC dysfunction and prevent PAH development. Methods and ResultsWe defined thousands of genes with direct SOX17 genomic binding sites and identified important potential binding partners, including ETS-transcription factors such as ERG by ChIP-seq in PAECs. Through the integration of three PAEC RNA-seq datasets involving overexpression and silencing of SOX17, we defined a robust SOX17 transcriptomic signature. In PAH patients, circulating plasma protein levels of 10 SOX17 signature genes were associated with the SOX17 common risk variants. This included EFNB2 and UNC5B; knockdown of these genes altered the viability and apoptosis of PAECs in response to TNF treatment. The drug-transcriptome database Connectivity Map (CMap) was used to predict novel potential therapeutic compounds to correct the SOX17 transcriptomic signature. Five compounds were selected for in vitro testing and were able to partially reinstate SOX17 target gene expression in PAECs. One compound, BX-912, was selected for in vivo testing as it corrected the levels of multiple target genes, including suppressing Runt-related transcription factor-1 (RUNX1). BX-912 blocked the development of pulmonary hypertension in mice lacking the SOX17 enhancer associated with human disease. ConclusionWe have demonstrated the therapeutic potential of targeting SOX17 in PAH through correction of its gene targets, identifying BX-912 as a lead compound with in vivo efficacy.

4

Evo 2 Predicts Cardiomyopathy-Associated Variants and Elucidates Their Underlying Mechanisms

kurozumi, a.; otsuka, n.; Masamichi, I.; kawakami, t.; Isagawa, T.; kodera, s.; takeda, n.

2026-05-17 genomics 10.64898/2026.05.15.725304 medRxiv

Top 0.1%

3.7%

Show abstract

BackgroundAlthough advances in next-generation sequencing have accelerated the identification of genetic variants in cardiomyopathy, interpreting variants of uncertain significance (VUS) remains a clinical challenge. Evo 2 is a high-resolution genomic artificial intelligence model capable of predicting pathogenicity across large sequence contexts and enabling mechanistic interpretation; however, its application in cardiovascular genetics is limited. Here, we evaluated the utility of Evo 2 for assessing the pathogenicity and underlying mechanisms of cardiomyopathy-associated variants. MethodsWe used Evo 2 to predict the pathogenicity of single-nucleotide variants in cardiomyopathy-related genes listed on ClinVar. We assessed the ability of the model to identify characteristic structural features in both coding and noncoding regions using internal representation such as embeddings, and to infer the molecular mechanisms of variants within these regions. ResultsEvo 2 demonstrated high predictive accuracy for pathogenicity, achieving an AUROC of 0.983 and an AUPRC of 0.915. Notably, sparse autoencoders (SAEs) from embeddings identified features corresponding to higher-order structural features, including coiled-coil and actin-binding domains characteristic of cardiomyopathy-related proteins, and accurately detected mutations known to disrupt these domains. The model recognized the binding motif of the cardiac-enriched transcription factor TBX5 with SAEs and accurately predicted a single-nucleotide polymorphism affecting TBX5 binding affinity after supervised fine-tuning. ConclusionsEvo 2 demonstrated strong performance for both predicting pathogenicity and extracting biological features of cardiomyopathy-associated variants. It may represent a powerful emerging tool for evaluating VUS in cardiovascular medicine.

5

A proteomic polygenic score to identify IL-18 driven inflammatory bowel disease

Turchin, M. C.; Raghupathy, N.; Carty, C. L.; Morris, M.; Maranville, J. C.; Holzinger, E. R.

2026-05-21 genetic and genomic medicine 10.64898/2026.05.18.26353508 medRxiv

Top 0.1%

3.7%

Show abstract

High levels of IL-18 have been causally implicated in IBD risk and may represent a unique mechanism driving IBD yet to be therapeutically targeted. To identify individuals predisposed to increased levels of IL-18, we implemented a polygenic approach to predict IL-18 plasma protein levels. Using a dataset with over 50,000 individuals with both genetic and plasma protein levels from Olink, we developed a 27 SNP polygenic score that predicts IL-18 levels and IBD risk. Further, we identified a threshold to classify patients as 'IL-18 High' using a data-driven approach that optimized prediction of both IL-18 and IBD risk. We show that ~30% of the overall IBD patient population is 'IL-18 High', meaning a genetic predisposition towards higher protein levels. The IL-18 PGS and corresponding threshold have the potential to identify IBD patients with IL-18-driven IBD that may respond more effectively to a therapy targeting this mechanism.

6

Comprehensive Profiling of Age- and Immune Cell- Specific Signaling Activation Using Multiplex Phosphoflow

Hadlova, P.; Svaton, M.; Kochmannova, K.; Korzhenevich, J.; Schmidt, F.; Neys, S. F. H.; Bott, M.-T.; Vrabcova, P.; Staniek, J.; Bloomfield, M.; Kalina, T.; Rizzi, M.

2026-05-27 immunology 10.64898/2026.05.24.727113 medRxiv

Top 0.1%

3.7%

Show abstract

Immune phenotyping represents a pillar in diagnostics, characterization of new genetic defects, and understanding mechanisms of diseases. Cell population distribution often does not cover the intrinsic function changes that may contribute to disease. Outcome of signaling activation can be used as proxy for cell function. To overcome the limitation of sample availability and standardization of signaling assays, we developed a multiplex full spectrum cytometry phosphoflow assay allowing the study of 6 phospho-proteins representing BCR/TCR, MAPK, PI3K/Akt/mTOR and canonical NF-{kappa}B signaling pathways in 18 immune cell subpopulations. Maximal stimulation and temporal dynamics were studied in response to pan-stimuli, activating cells regardless of receptor, and targeted stimuli for T, B, and innate immune cells. We studied healthy individuals between 1-69 years and discovered subpopulations-specific responses. Furthermore, pediatric donors showed broad differences in B cell and T cell function compared to adults. Hence, we established a tool to assess multiple signaling pathways at once and provide age- and subpopulation-specific references for signaling outcome. SummaryMultiplex full spectrum flow cytometry-based phosphoflow assay across 18 immune cell subpopulations, 6 phospho-proteins in response to 6 stimuli at 4 time points in individuals aged 1-69 years, reveals distinct age- and subpopulation-associated signaling patterns in magnitude and dynamics of pathways activation.

7

Large-scale association study identifies lung cancer susceptibility copy number variants and their potential functional role in genetic instability

Xiao, F.; Qin, F.; Luo, X.; Slewitzke, S. E.; Fernandes, G. F.; Johansson, M.; Xiao, X.; Zaridze, D.; Bojesen, S. E.; Shete, S.; Albanes, D.; Aldrich, M. C.; Tardon, A.; Fernandez-Tardon, G.; Le Marchand, L.; Rennert, G.; Bickeböeller, H.; Wichmann, H.-E.; Risch, A.; Muley, T.; Rosenberger, A.; Field, J. K.; Davies, M.; Woll, P.; Kiemeney, L. A.; Haugen, A.; Zienolddiny, S.; Lam, S.; Johansson, M.; Grankvist, K.; Schabath, M. B.; Andrew, A.; Lazarus, P.; Arnold, S. M.; Zhu, D.; Brenner, H.; Neuhouser, M. L.; Hung, R. J.; Christiani, D. C.; McKay, J.; Cai, G.; Xia, J.; Amos, C. I.

2026-05-15 genetic and genomic medicine 10.64898/2026.05.11.26352741 medRxiv

Top 0.2%

3.3%

Show abstract

Background: Genome-wide association studies (GWAS) have identified numerous lung cancer susceptibility loci based on single nucleotide polymorphisms (SNPs), yet a substantial proportion of heritability remains unexplained. We therefore evaluated germline copy number variants (CNVs) as an underexplored source of genetic susceptibility and potential contributors to genomic instability in lung cancer. Methods: We conducted a genome-wide analysis of germline CNVs using 19,342 cases and 15,917 controls from the Transdisciplinary Research in Cancer of the Lung (TRICL) consortium, with replication in two independent cohorts. High-confidence CNVs were identified by integrating two CNV callers including PennCNV and modSaRa2. Association analyses were performed using both gene-based and CNV region-based approaches. Polygenic risk scores (PRS) were constructed from top loci, and functional validation was conducted using siRNA-mediated knockdown in lung fibroblast cells. Results: We identified CNVs in four genomic regions (1p36.22, 2q31.2, 6p21.32, and 19q13.32) significantly associated with lung cancer risk. Two loci (1p36.22 and 2q31.2) were consistently supported across both analytical strategies. A CNV-based PRS constructed from key genes (CLCN6, NFE2L2, OPA3, and PSMB8) was significantly associated with lung cancer risk and replicated across independent datasets. Functional assays demonstrated that knockdown of NFE2L2 and OPA3 increased endogenous DNA damage, supporting a role in genomic stability. Conclusions: Germline CNVs contribute to lung cancer susceptibility and may influence carcinogenesis through mechanisms related to genomic instability. Impact: These findings expand the genetic architecture of lung cancer and highlight CNVs as potential biomarkers for improving risk stratification and informing precision prevention strategies.

8

Bacterial Virulence Genes Detected by Metagenomic Sequencing in the Cystic Fibrosis Airway Microbiome

Valluri, M. L.; Harmon, B.; Burrell, A.; Hahn, A.

2026-05-19 microbiology 10.64898/2026.05.19.726200 medRxiv

Top 0.2%

2.6%

Show abstract

BackgroundCystic fibrosis (CF) is an autosomal recessive genetic disorder that leads to chronic infection and mucus retention in the lungs, with lung function gradually deteriorating through recurrent pulmonary exacerbations (PEx). Virulence factors (VFs) of Pseudomonas aeruginosa and Staphylococcus aureus are thought to contribute to pulmonary exacerbations. Our study objective was to identify VF genes related to PEx, high Pseudomonas abundance, and high Staphylococcus abundance in persons with CF (pwCF). MethodsThis was an ancillary study of pwCF treated with IV antibiotics for PEx between 2016-2020 at Childrens National Hospital. Using shotgun metagenomics and ShortBRED, we identified bacterial VF genes and used DESeq2 to determine differential expression of VF genes across comparators. ResultsTwenty-two PwCF experienced 43 PEx. The study cohort had a mean age of 14.6 years, 41% female, 59% white, 36% Hispanic, and 45% had an F508del homozygous CFTR mutation. Minimal differences in VF gene abundance were identified across clinical state. The most differentially increased VF genes found in Pseudomonas high samples were associated with an aminotransferase (log2FC 25.9), flagellar biosynthesis (log2FC 8.3), and type VI secretion systems (log2FC 8.2). The most differentially increased VF genes found in Staphylococcus high samples were an exotoxin (log2FC 26.7), macrolide phosphotransferase (log2FC 25.8), pathogenicity island proteins (log2FC 25.2 and 24.7), and VOC family proteins (log2FC 24.8). ConclusionsThese findings demonstrate that specific VFs associated with immune modulation, motility secretion systems, bacterial motility, and antibiotic resistance are related to P. aeruginosa and S. aureus abundance, providing potential targets for more personalized antimicrobial interventions.

9

Integrated serum proteomics and autoantibody analyses reveal a biomarker signature predictive of flare during biologic tapering in rheumatoid arthritis

J Blanco, F.; Quaranta, P.; Dominguez-Guerrero, P.; Calamia, V.; Fernandez-Puente, P.; Paz-Gonzalez, R.; Balboa-Barreiro, V.; Noriega, D.; Galindo, L.; Acasuso, B.; Oreiro, N.; Rojo, R.; Lourido, L.; Ruiz-Romero, C.

2026-05-19 molecular biology 10.64898/2026.05.19.726198 medRxiv

Top 0.3%

2.1%

Show abstract

BackgroundRheumatoid arthritis (RA) is a chronic immune-mediated inflammatory disease characterized by a heterogeneous clinical course with periods of remission and flare. Although biologic DMARDs (bDMARDs) have revolutionized RA treatment by enabling sustained disease control, their long-term use is associated with adverse effects and high costs, making dose tapering an attractive but clinically challenging strategy. The lack of reliable biomarkers to predict flare risk limits safe implementation of treatment de-escalation. This study aimed to identify novel circulating protein biomarkers associated with flare risk in RA patients undergoing bDMARDs tapering, useful to enable biomarker-guided treatment optimization strategies. MethodsA discovery proteomic analysis using mass spectrometry was performed on baseline serum samples from a subset of the OPTIBIO clinical trial (n=44), followed by validation in the full cohort (n=194) using ELISA. Functional pathway analysis explored biological processes associated with candidate biomarkers. In parallel, anti-cytokine autoantibodies were profiled using multiplex immunoassays. Logistic and Cox regression models were used to assess associations with flare risk. Predictive models integrating biomarkers and clinical variables were evaluated using receiver operating characteristic (ROC) analysis, sensitivity and specificity metrics, and decision curve analysis to assess clinical utility. ResultsMass spectrometry identified 806 proteins, of which 87 were differentially expressed at baseline between patients who flared and those who maintained remission during follow-up within the intervention (tapering) arm. Functional enrichment analysis highlighted immune-regulatory and innate immune pathways. Among the candidates, V-set immunoglobulin-domain-containing 4 (VSIG4) was validated as a biomarker associated with increased flare risk. Anti-interferon-{gamma} (anti-IFN{gamma}) autoantibodies were also associated with flare. A combined model including VSIG4, anti-IFN{gamma}, and the clinical variable DAS28-CRP improved predictive performance compared with clinical variables alone (AUC 0.76 vs 0.66), achieving significantly higher sensitivity. Decision curve analysis demonstrated higher net benefit of the combined model, indicating improved clinical decision-making. In a secondary analysis focused on patients with prolonged remission, representing the most suitable candidates for safe treatment tapering, the model performance further improved (AUC 0.84). ConclusionIntegration of novel serum proteomic and autoantibody biomarkers with clinical parameters improves prediction of flare during biologic tapering in RA and provides clinically relevant benefit for patient stratification. These findings support further development of biomarker-driven approaches for personalized treatment optimization strategies.

10

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

Froukh, T.

2026-05-27 genetic and genomic medicine 10.64898/2026.05.23.26353895 medRxiv

Top 0.3%

2.1%

Show abstract

Currently, the genetic architecture of Middle Eastern populations is underrepresented in global genomic databases. This gap increases the rate of Variants of Uncertain Significance (VUSs) and clinical misinterpretations of genomic data especially in Middle Eastern populations. Whole exome sequencing was conducted on 90 healthy individuals from Jordan and the data were analysed using Principal Component Analysis (PCA) and multi-computational filtering. PCA revealed a double ancestry (EUR-AFR) admixture rather than a triple admixture (EUR-AFR-AMR). More than 3,500 populations-specific variants (PSVs) were identified, of which 72% were singletons. Additionally, 19 variants were significantly enriched compared to the maximum allele frequencies in public global databases (Fisher's exact test with Benjamini-Hochberg false discovery rate correction, p-value < 0.05). Consequently, the results suggest the reclassification of variants of Uncertain Significance (VUS) which reside in the ECE2 gene to likely benign and the variants of Conflicting Classification of Pathogenicity in the genes IL1RN and THPO to benign based on the significant allele frequency (AF=0.0389, p-value < 0.05). Furthermore, a pathogenic ClinVar variant was identified in a healthy individual, warranting careful interpretation. The findings underscore the importance of identifying PSVs in order to minimize or even prevent clinical misdiagnosis and highlight the unique genetic signature in Jordan. The study serves as a foundational resource for precision medicine in the region.

11

Building an Interoperable Rare Disease Multi-omic Resource: The GREGoR Data Model and Dataset

Heavner, B. D.; Wheeler, M. M.; Bengtsson, J. D.; Carvalho, C. M. B.; Cheung, W. A.; Conomos, M. P.; Delot, E. C.; DiTroia, S.; Ganesh, V. S.; Gogarten, S. M.; Grochowski, C. M.; Jhangiani, S. N.; King, C. H.; LeMaster, C.; Marvin, C. T.; Marwaha, S.; Miller, D. E.; O'Donnell-Luria, A.; Pais, L.; Patterson, K.; Qi, G.; Richardson, M.; Smail, C.; Stilp, A. M.; Tong, C. C.; Ungar, R. A.; Weisburd, B.; Bamshad, M. J.; Bernstein, J. A.; Eichler, E. E.; Gibbs, R. A.; Lupski, J. R.; May, S. J.; Montgomery, S. B.; Pastinen, T.; Posey, J.; Rehm, H. L.; Shojaie, A.; Talkowski, M. E.; Vilain, E.; Wei, C

2026-05-19 genomics 10.64898/2026.05.15.725546 medRxiv

Top 0.4%

1.9%

Show abstract

Rare disease research and diagnosis rely on the integration of genomic and phenotypic data generated across diverse clinical sites; however, the absence of widely adopted standards for representing genomic data and associated metadata has limited data interoperability, reuse, and cross-study analysis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was established to investigate challenging rare disease cases and evaluate emerging multi-omic technologies for clinical translation. To support coordinated data integration across distributed research sites, we developed a common Consortium Data Model in partnership with domain experts to standardize the capture of participant-, family-, phenotype- and assay-level metadata, with a particular emphasis on using a modular architecture to support linking of multiple data versions from multiple omic technologies to a single individual and attribution of a genetic finding to the specific technology used for its initial discovery. Adoption of the GREGoR Data Model has enabled continued generation and public release of a harmonized, analysis-ready Consortium Dataset. The most recent release includes phenotypic, family and multi-omic data from 12,292 participants in 5,029 families. Other rare disease data sharing efforts are beginning to adopt this data model which will facilitate cross consortium analyses and empower rare disease research. This work demonstrates that a collaborative, flexible, and scalable data model can enable large-scale rare disease research, facilitate cross-center data harmonization, and enable data interoperability.

12

Immunometabolic Remodeling of Perivascular Adipose Tissue in Murine Lupus: Implications for Lupus Vasculopathy

Shi, H.; Weintraub, N. L.; Liu, L.; Zhang, Y.; Kim, D.; Goo, B.; Xiong, X.; Han, Q.; Annex, B. H.; Ley, K.; Carbone, L.; Kahlenberg, J. M.; Fulton, D. J. R.; Stepp, D. W.; Kim, H. W.; Lee, R.; Patel, V.; Gallo, D.; Wu, H.; Hu, T.; Ogbi, M.; Lyu, Q.; Wu, T. S.; Zhang, T.

2026-05-19 molecular biology 10.64898/2026.05.18.726104 medRxiv

Top 0.4%

1.7%

Show abstract

BackgroundPatients with systemic lupus erythematosus (SLE) face markedly increased cardiovascular disease (CVD) risk driven by mechanisms beyond traditional risk factors. Thoracic aortic perivascular adipose tissue (tPVAT) is dysfunctional in lupus and exacerbates endothelial dysfunction, yet the molecular basis of this dysfunction remains poorly defined. MethodsIntegrated multi-omics profiling, including bulk RNA-seq, untargeted proteomics, lipidomics, and high-dimensional spectral flow cytometry, was performed on tPVAT from 15-week-old MRL/lpr mice (active lupus, n = 4-6) and MRL control mice (n = 5-6). Adipogenic differentiation capacity of tPVAT adipose stromal and progenitor cells (ASPCs) from MRL/lpr was assessed by Oil Red O staining at 5 (pre-dieasea) and 15 weeks (active disease), with subcutaneous ASPCs used as depot controls. ResultsTranscriptomic profiling of tPVAT from MRL/lpr mice identified 2,742 upregulated and 1,494 downregulated genes (adjusted p < 0.001, |log2FC| > 1), with strong activation of interferon, IL6-JAK-STAT3, and TNFA signaling pathways together with suppression of fatty acid metabolism, oxidative phosphorylation, and adipogenic pathways. Proteomic and lipidomic analyses were concordant, revealing broad downregulation of mitochondrial bioenergetic machinery, depletion of cardiolipin and acylcarnitines, and enrichment of ceramide phosphoinositols and lysophosphatidylcholines. Cardiolipin strongly correlated with the mitochondrial/metabolic protein module (r = 0.95) and inversely with the immune/inflammatory protein module (r = -0.92). Spectral flow cytometry confirmed marked CD45+ leukocyte infiltration dominated by T cells, together with a significantly reduced Treg/CD4+ ratio indicating loss of local immunoregulatory balance. ASPCs derived from PVAT of 15-week-old MRL/lpr mice exhibited impaired white and beige adipogenic differentiation, while APCs from PVAT of 5-week-old MRL/lpr mice, and from subcutaneous adipose tissues of 15-week-old MRL/lpr mice, had normal white and beige differentiation, consistent with an acquired, depot-specific, disease-stage-dependent progenitor defect in PVAT of MRL/lpr mice. ConclusionsLupus tPVAT undergoes a concordant cross-platform molecular reprogramming of mitochondrial bioenergetic genes coupled with establishment of an interferon-dominant immune niche and acquired loss of ASPC adipogenic capacity. These findings provide a molecular framework for lupus PVAT dysfunction and identify restoration of mitochondrial function, suppression of interferon-driven inflammation, and renewal of progenitor differentiation as potential therapeutic strategies for lupus vasculopathy.

13

Calibrated high-throughput electrophysiology enables clinical interpretation of CACNA1G missense variants

Finol-Urdaneta, R. K.; Tan, C.-Y.; Maksemous, N.; Ma, J. G.; Lockhart, P.; Snell, P.; Malhotra, A.; Thompson, B. A.; Garg, G.; Goel, H.; Griffiths, L. R.; Adams, D. J.; Vandenberg, J. I.; Ng, C. A.

2026-05-18 neuroscience 10.64898/2026.05.10.724145 medRxiv

Top 0.4%

1.7%

Show abstract

ObjectiveAccurate classification of ion channel variants of uncertain significance (VUS) remains a persistent challenge in clinical genomics, limiting diagnostic resolution in neurological disorders. MethodsWe developed a calibrated electrophysiological framework to generate functional evidence for clinical interpretation of CACNA1G variants encoding the low-voltage-activated calcium channel Cav3.1. Functional metrics derived from automated patchclamp recordings were calibrated against benign/likely benign (B/LB) and pathogenic/likely pathogenic (P/LP) reference variants to enable conservative application of ACMG/AMP functional criteria within clinical variant interpretation workflows. ResultsCalibration using 25 B/LB and 16 P/LP CACNA1G variants showed that more than 80% of P/LP variants exhibited reduced current density (CD). Deactivation kinetics ({tau}Deact) provided complementary discriminatory information by identifying gating abnormalities in variants with preserved CD. Application of this dual-metric framework to five VUS identified in Australian patients revealed two variants (Cav3.1-R186Q and R1394Q) with abnormal functional profiles consistent with voltage-sensor perturbation, supporting reassessment as likely pathogenic under ACMG/AMP guidelines. The remaining VUS displayed functional properties overlapping the benign reference distribution. ConclusionThese findings establish a calibrated functional framework for generating electrophysiological evidence that supports clinical interpretation of CACNA1G missense variants under ACMG/AMP guidelines. When applied as external functional evidence, this approach improves resolution of CACNA1G-associated VUS while maintaining conservative standards for variant classification.

14

Optical genome mapping identifies source-associated structural variant differences across early-passage human iPSCs

Namvar, L.; Sedov, K.; Yang, M. J.; Hermosillo, R.; Zafar, F.; Schuele, B.

2026-05-31 genomics 10.64898/2026.05.29.728843 medRxiv

Top 0.4%

1.7%

Show abstract

BackgroundInduced pluripotent stem cells (iPSCs) are an important model for studying human diseases in vitro. However, previous studies have shown that iPSC reprogramming and extended cell culture can introduce genomic structural variants (SVs). Technologies like karyotyping, CNV microarrays, and whole-genome sequencing have limitations in resolution, sensitivity, or the ability to detect large and complex structural variants compared to optical genome mapping (OGM). OGM is a genome-wide structural variant detection method that analyzes fluorescently labeled ultra-high-molecular-weight DNA molecules to identify copy-number and balanced rearrangements. At sufficient coverage, OGM can detect SVs at approximately [≥]2 kbp and identify mosaic events supported by molecule-level evidence, offering higher resolution than conventional karyotyping or SNP-array-based QC. Here, we compared iPSC clones derived from peripheral blood mononuclear cells (PBMCs) and fibroblasts (FBCs) to determine whether starting somatic cell source is associated with differences in structural variant burden and SV-type profiles after nuclear reprogramming into iPSCs. ResultsWe analyzed 73 low-passage iPSC clones generated from 25 parental lines using OGM. Compared with PBMC-iPSCs, FBC-iPSCs showed higher SV burden with the enrichment of duplications [≥]100 kbp, more frequent overlap with protein-coding genes, fragile sites, and recurrent chromosomal hotspot regions. In contrast, PBMC-iPSCs showed fewer SVs overall, and a higher proportion of clones without detectable clone-specific SVs. ConclusionsOGM provides a high-resolution approach for post-reprogramming genomic quality control by detecting clone-specific structural variants at approximately [≥]2 kbp, including events below the resolution of conventional cytogenetic and SNP-array-based assays. In these early passage iPSCs, SVs overlapped protein-coding genes, fragile sites, and recurrent culture-associated chromosomal regions, underscoring the need for clone-level genomic assessment before downstream applications. FBC-derived iPSCs showed a higher SV burden, including more frequent and larger duplications, whereas PBMC-derived iPSCs more often lacked detectable clone-specific SVs. These findings suggest that PBMC-iPSCs and FBC-iPSCs can differ in post-reprogramming SV profiles and support the use of OGM as a QC strategy during iPSC generation and selection.

15

Machine learning methodology using a masked neural network for robust genetic risk score calculation from noisy and missing data

Squires, S.; Weedon, M. N.; Oram, R. A.

2026-05-20 genetic and genomic medicine 10.64898/2026.05.18.25341725 medRxiv

Top 0.5%

1.7%

Show abstract

Purpose: Genetic risk scores (GRSs) are summaries of genetic data that can improve prediction of disease risk and progression. GRSs are increasing available but rely on high quality input data to produce good output results; with noisy or missing inputs the GRS may be inaccurate. We aimed to develop a method to produce a robust estimate of the GRS when input data is missing, noisy or both. Approach: We developed a neural network approach, named masked-MLP, for robust GRS calculation trained on a set of GRS scores calculated on clean data. The masked-MLP includes additional input data and has noise inserted during training, both which make the model more robust. Results: A GRS for type 1 diabetes (T1D) calculated on input data with 10\% of the data corrupted had a Spearman rank correlation to the clean GRS of 0.669 (0.665-0.674) while the equivalent for the masked-MLP was 0.951 (0.950-0.952). For the same data the area under the receiver operating characteristic curve for separation of T1D from population samples fell from 0.919 (0.904-0.932) to 0.808 (0.787-0.827) for the GRS while the masked-MLP fell to 0.910 (0.895-0.924). Conclusions: The masked-MLP was more robust to noise when calculating a GRS than using standard approaches. Our approach has the potential to ensure both improved research and clinical outcomes due to more reliable GRS calculation.

16

Regulatory Genomics of Preeclampsia-Specific Risk Variants Highlights Immune and Endothelial Mechanisms

Farahat, M. A.; Abbas, M.; Wiafe, G. A.; Cheairs, T. G.; Nel, M.; Gaye, A.

2026-05-29 genomics 10.64898/2026.05.26.728031 medRxiv

Top 0.5%

1.7%

Show abstract

BackgroundPreeclampsia (PE) is a complex hypertensive disorder of pregnancy characterized by endothelial dysfunction, immune dysregulation, and systemic vascular injury. Multiple genome-wide association studies (GWAS) have revealed genetic signals shared with hypertension and blood pressure traits, potentially obscuring biological mechanisms that are more specific to PE pathogenesis. Furthermore, the functional consequences of most PE-associated variants remain poorly understood. In addition, GWAS relies on short-read sequencing and array-based analyses, limiting the ability to identify insertions, deletions, and other structural variants that may contribute to disease-associated regulatory mechanisms. In this study, we investigated the regulatory architecture of PE-specific genetic variants and evaluated their potential linkage disequilibrium (LD) with structural variants. MethodsWe integrated GWAS, transcriptomic, and long-read sequencing data to investigate the regulatory architecture of PE-specific genetic variants. Summary statistics for PE, hypertension, systolic and diastolic blood pressure were obtained from the GWAS Catalog, and variants uniquely associated with PE (P [≤] 1x10-4) were prioritized. Cis-expression quantitative trait locus (cis-eQTL) analyses were performed in whole-blood RNA-sequencing data from 180 African American women. Significant associations were replicated in biologically relevant tissues from the GTEx Project, including vascular, renal, and immune-related tissues. Long-read sequencing-derived structural variants (SVs) were subsequently evaluated for LD with replicated eQTL loci. ResultsA total of 10,843 PE-specific variants, present in whole-genome sequencing data of the 180 women, were evaluated. Cis-eQTL analyses identified 480 significant eQTL-gene associations involving 277 unique variants and 192 genes (FDR [≤] 0.05). Replication analyses supported 69 eQTL-gene associations across five GTEx tissues, involving 35 variants and 14 genes. Replicated signals were enriched in vascular tissues, particularly artery tibial and artery aorta. Several prioritized genes converged on immune and vascular pathways, including MICA, HLA-DPB1, SEMA4D, JUP, ZFP57, and TMEM204. Integration of GWAS and eQTL effects demonstrated consistent regulatory shifts associated with PE-risk alleles, including downregulation of immune-related loci and upregulation of select vascular-associated genes. Long-read sequencing analyses identified 66 high-LD (r2 [≥] 0.80) SNP-SV-gene associations, including 12 replicated eQTL variants, 8 candidate SVs, and 3 replicated genes, suggesting that structurally complex genomic regions may contribute to the observed regulatory signals. ConclusionsThe tissues enriched in the regulatory signal highlight the importance of systemic endothelial biology in PE susceptibility. The findings of this study support a model in which PE-specific genetic susceptibility converges predominantly on interconnected immune and vascular regulatory mechanisms. The integration of eQTL analyses with long-read structural variant discovery provides additional insight into the complex genomic architecture underlying PE and highlights candidate regulatory loci that may not be adequately captured through conventional GWAS approaches alone. The study also emphasizes the importance of conducting functional genomic analyses in diverse populations to improve understanding of disease biology and advance precision medicine efforts. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=105 SRC="FIGDIR/small/728031v1_ufig1.gif" ALT="Figure 1"> View larger version (40K): org.highwire.dtl.DTLVardef@1caf4a5org.highwire.dtl.DTLVardef@1839b4eorg.highwire.dtl.DTLVardef@14922c3org.highwire.dtl.DTLVardef@894040_HPS_FORMAT_FIGEXP M_FIG C_FIG Regulatory Genomics of Preeclampsia-Specific Risk Variants Highlights Immune and Endothelial Mechanisms. GWAS summary statistics for preeclampsia, hypertension, SBP, and DBP were integrated to identify 10,843 preeclampsia-specific variants that were subsequently evaluated in cis-eQTL analyses using whole-blood RNA-sequencing data from 180 African American women (left). Cis-eQTL analyses identified 480 significant associations involving 277 variants and 192 genes (FDR [≤] 0.05), of which 69 eQTL-gene associations involving 35 variants and 14 genes replicated across five GTEx tissues, with strongest enrichment observed in vascular tissues, particularly artery tibial and artery aorta (center). Prioritized genes, including MICA, HLA-DPB1, SEMA4D, JUP, ZFP57, and TMEM204, converged on interconnected immune and endothelial pathways associated with systemic vascular dysfunction, impaired placentation, and inflammatory dysregulation in preeclampsia. Integration of long-read sequencing data further identified 66 high-LD SNP-SV-gene associations involving 12 replicated eQTL variants, 8 candidate structural variants, and 3 replicated genes, suggesting that structurally complex genomic regions may contribute to regulatory mechanisms not fully captured through conventional GWAS approaches alone. eQTL indicates expression quantitative trait locus; FDR, false discovery rate; GTEx, Genotype-Tissue Expression project; SBP, systolic blood pressure; DBP, diastolic blood pressure; LD, linkage disequilibrium; SV, structural variant.

17

Visual gamma stimulation causes prolonged enhancement of low-frequency blood flow oscillations across cortical regions in mice

Bressan, P. R.; Long, E.; Jiang, J.; Vithayathil, R.; Guan, Z.; Song, Y.; Rauscher, B. C.; Chai, N.; Kilic, K.; Erdener, S. E.; Devor, A.; Boas, D. A.; Tang, R.

2026-06-03 neuroscience 10.64898/2026.05.31.729102 medRxiv

Top 0.5%

1.5%

Show abstract

IntroductionGamma entrainment using sensory stimuli (GENUS) uses 40Hz-pulsed sensory stimuli to entrain neural activity in the gamma band (30-150Hz). However, the effect of GENUS on low-frequency vascular oscillations has not been fully explored. ObjectivesThe objective of this study is to elucidate the effect of GENUS on vasomotion in healthy mice and potential confounds for future application in disease studies. MethodsHead-fixed, awake C57Bl/6 mice (n=18; 9M 9F) aged between 18 to 60 weeks were subjected to white light of either 40Hz visual flicker (GENUS), or constant stimulus (control). Blood flow was imaged using laser speckle contrast imaging (LSCI) before, during, immediately after 1 hour of stimulus, and 30min after the stimulus termination. ResultsA linear mixed-effects model showed that GENUS enhanced the magnitude of 0.2-0.4Hz blood flow oscillations by 38% during stimulation and by 30% at 30 minutes after stimulation compared to control when controlled for age, sex, and other factors. The effect on vasomotion was distributed across many cortical regions not limited to visual areas and lasted beyond 24 hours post-stimulus. ConclusionThese results support the exploration of GENUS for increasing vasomotion in therapeutic contexts. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=186 SRC="FIGDIR/small/729102v1_ufig1.gif" ALT="Figure 1"> View larger version (67K): org.highwire.dtl.DTLVardef@1deee2eorg.highwire.dtl.DTLVardef@e71833org.highwire.dtl.DTLVardef@1e5c4f0org.highwire.dtl.DTLVardef@1e4832e_HPS_FORMAT_FIGEXP M_FIG C_FIG

18

The Genetic Landscape and Epidemiological Characteristics of Inherited Retinal Diseases in the Chinese Population

Zeng, B.; Cui, Z.; Zhou, S.; Dai, W.

2026-05-29 ophthalmology 10.64898/2026.05.27.26354224 medRxiv

Top 0.6%

1.5%

Show abstract

Background: Inherited Retinal Diseases (IRDs) are a group of genetically heterogeneous blinding conditions. Major global genomic reference databases are disproportionately enriched for individuals of European ancestry. This underrepresentation creates a significant bias that impedes the accuracy of genetic diagnosis in the Chinese population. This study aims to address this limitation by constructing a comprehensive genetic landscape of IRDs using large-scale deep-sequencing data from a large Chinese cohort. Methods: The study leveraged variant data primarily from 10,588 individuals in the China Metabolic Analytics Project (ChinaMAP) and cross-referenced findings against multiple national and international databases. We systematically curated variants within a targeted panel of 291 IRD-associated genes. Variant pathogenicity was assessed using a comprehensive pipeline integrating InterVar-automated classification based on 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines, ClinVar evidence (review status [≥] 1 star), and manual literature curation. We delineated the mutational spectrum, identified population-enriched pathogenic/likely pathogenic (P/LP) variants, and analyzed the distribution characteristics of IRD-associated highly-mutated genes. Furthermore, we calculated the carrier frequencies (CF) and genetic prevalence (GP) of autosomal recessive(AR)-IRD genes in the Chinese population. Results: The study revealed a highly concentrated genetic landscape for AR-IRDs in the Chinese population, with ABCA4 and USH2A emerging as the primary drivers of the genetic burden. This finding aligns with previous Chinese cohorts but contrasts with global databases, where genes such as the X-linked RPGR are more prevalent. In contrast, autosomal dominant (AD)-IRDs exhibited high locus heterogeneity, with pathogenic variants dispersed across numerous genes (e.g., COL2A1 and MFN2). We identified a series of P/LP variants that were either high-frequency or significantly enriched in the Chinese population, such as CNGB1 (p.P530R) and specific recurrent alleles in ABCA4 and CYP4V2. The estimated cumulative CF for AR-IRDs was 1 in 5.60, and the theoretical total GP was 1 in 2,624.67, based on the ChinaMAP data. Conclusion: By integrating the ChinaMAP dataset with diverse genomic resources, this study provides a genetic landscape of IRDs in the Chinese population. Our analysis shows a concentrated mutational spectrum in AR-IRDs, contrasting with the pronounced heterogeneity in AD-IRDs. These findings, including population-specific pathogenic variants and refined prevalence estimates, provide a resource for precision diagnostics, genetic counseling, expanded carrier screening (ECS), and public health policy development in China.

19

Reappraisal of GPR40/FFAR1 as a Therapeutic Target for Type 2 Diabetes Mellitus: Systematic Cheminformatic Analysis of 2,637 Compounds in ChEMBL 36 Identifies Superior Candidates to Fasiglifam

TANG, W.; ZHANG, Z.

2026-05-21 pharmacology and toxicology 10.64898/2026.05.19.726272 medRxiv

Top 0.6%

1.4%

Show abstract

BackgroundThe discontinuation of Fasiglifam (TAK-875), a GPR40/FFAR1 full agonist, during Phase 3 clinical trials due to hepatotoxicity led to widespread abandonment of GPR40 as a viable therapeutic target for type 2 diabetes mellitus (T2DM). However, mechanistic evidence suggests that Fasiglifams hepatotoxicity arises from mitochondrial liability driven by high lipophilicity (aLogP = 5.31), rather than from on-target GPR40 signaling. We hypothesized that target-level failure was incorrectly inferred from compound-level safety concerns, and that superior candidates exist within publicly available databases. MethodsWe queried ChEMBL Release 36 (28 GB SQLite, 74 tables) for all compounds with documented GPR40/FFAR1 activity (UniProt: O14842). Compounds were filtered by EC50 [≤] 10 nM in nM units with standard relation "=". Drug-likeness was assessed using Lipinskis Rule of Five (Ro5), aLogP, molecular weight (MW), hydrogen bond donors/acceptors (HBD/HBA), and polar surface area (PSA). A parallel analysis of Therapeutic Target Database (TTD v10.1.01, 4,298 targets) provided clinical context. A real-world evidence (RWE) patient stratification framework was constructed using EMR data from tens of millions of patients with >10 years of longitudinal follow-up. ResultsOf 2,637 GPR40-active compounds in ChEMBL 36, 526 (19.9%) demonstrated EC50 < 100 nM and 102 (3.9%) demonstrated EC50 < 10 nM. Eight compounds met stringent drug-likeness criteria (Ro5 violations = 0, aLogP < 5.0, EC50 [≤] 1 nM). The lead compound (CHEMBL4859651) exhibited EC50 = 0.04 nM (8.75-fold more potent than Fasiglifam), MW = 297 Da (43% lower), and aLogP = 4.30 (19% lower), with zero Ro5 violations. Mean MW of the eight candidates was 317 {+/-} 28 Da versus 524 Da for Fasiglifam. A parallel GCK analysis identified a protein-protein interaction target (CHEMBL3885579, GCK-GKRP interface) harboring 40 exclusive compounds as an orthogonal strategy for partial GCK activation. ConclusionsSystematic cheminformatic analysis reveals that compounds with substantially superior activity and drug-likeness profiles relative to Fasiglifam exist within ChEMBL 36. Fasiglifams hepatotoxicity is attributable to compound-specific physicochemical properties, not GPR40-mediated toxicity. RWE patient stratification may further mitigate hepatotoxicity risk for next-generation GPR40 agonists. These findings argue for systematic reappraisal of GPR40 as a viable therapeutic target for T2DM.

20

Equitable Health Intelligence: An Open Benchmark of Multi-Population Machine Learning for Omics-Based Cancer Prognosis

Sharma, T.; Chopra, A. P.; Agrawal, L.; Verma, N. K.; Starlard-Davenport, A.; Wang, J.; Hayes, D. N.; Cui, Y.

2026-06-02 bioinformatics 10.64898/2026.05.29.728755 medRxiv

Top 0.6%

1.4%

Show abstract

PurposeMachine learning (ML) models for omics-based cancer prognosis are often trained on data from predominantly European-ancestry populations, producing biased predictions for other populations and undermining equitable genomic medicine. Existing fairness benchmarks mainly focus on outcome parity rather than predictive performance parity across populations. Public benchmark resources are needed for systematically detecting and mitigating such performance disparities in multi-population cancer prognosis. MethodsWe developed Equitable Health Intelligence (EHI, https://ehiportal.org), an open-source benchmark of multi-population ML for omics-based cancer prognosis. EHI contains 1,475 ML tasks across 40 cancer/pan-cancer types, 4 omics feature sets, 4 clinical endpoints, 5 event-time thresholds, and 3 data-disadvantaged population (DDP) groups relative to a majority European Ancestry population group. Deep neural network models are trained under three multi-population ML schemes (Mixture, Independent, and Transfer Learning), with Naive Transfer included as a no-adaptation control, comprising a total of 10,325 ML experiments. ResultsThe EHI platform provides an interactive environment with visualization and exploratory tools for users to inspect predictive performance disparities between the majority European-ancestry group and data-disadvantaged populations, evaluate the extent to which transfer learning mitigates these disparities, and examine the impact of feature engineering methods across cancer types, omics features, and clinical endpoints. ConclusionEHI is an open, interactive, and extensible benchmark for identifying and addressing performance disparities in multi-population ML for omics-based cancer prognosis. It provides a foundation for a growing ecosystem of methods targeting ML performance disparities arising from biomedical data inequality and population-level distribution shifts, thereby advancing equitable AI in precision oncology.